Homework 5
Important:
All answers need to be round to 3 decimals, except the last problem, which needs the accurate answer.
In this question, we will train a Naive Bayes classifier to predict class labels Y as a function of input features .
We are given the following 15 training points:
What is the maximum likelihood estimate of the prior P(Y)?
Y | P(Y) |
A | [q1.1] |
B | [q1.2] |
C | [q1.3] |
What are the maximum likelihood estimates of the conditional probability distributions? Fill in the tables below (the second and third are done for you).
Y | ||
0 | A | [q1.4] |
1 | A | [q1.5] |
0 | B | [q1.6] |
1 | B | [q1.7] |
0 | C | [q1.8] |
1 | C | [q1.9] |
Y | ||
0 | A | 1.000 |
1 | A | 0.000 |
0 | B | 0.222 |
1 | B | 0.778 |
0 | C | 0.250 |
1 | C | 0.750 |
Y | ||
0 | A | 0.500 |
1 | A | 0.500 |
0 | B | 0.000 |
1 | B | 1.000 |
0 | C | 0.500 |
1 | C | 0.500 |
Following question 1, Now consider a new data point . Use your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:
Y | |
A | [q2.1] |
B | [q2.2] |
C | [q2.3] |
Y | |
A | [q2.4] |
B | [q2.5] |
C | [q2.6] |
What label does your classifier give to the new data point? (Break ties alphabetically). Enter capital letters only
[q2.7]
The training data is repeated here for your convenience:
Following the previous questions, now use Laplace Smoothing with strength k = 3 to estimate the prior P(Y) for the same data.
Y | P(Y) |
A | [q3.1] |
B | [q3.2] |
C | [q3.3] |
Use Laplace Smoothing with strength k = 3 to estimate the conditional probability distributions below (again, the second two are done for you).
Y | ||
0 | A | [q3.4] |
1 | A | [q3.5] |
0 | B | [q3.6] |
1 | B | [q3.7] |
0 | C | [q3.8] |
1 | C | [q3.9] |
Y | ||
0 | A | 0.625 |
1 | A | 0.375 |
0 | B | 0.333 |
1 | B | 0.667 |
0 | C | 0.400 |
1 | C | 0.600 |
Y | ||
0 | A | 0.500 |
1 | A | 0.500 |
0 | B | 0.200 |
1 | B | 0.800 |
0 | C | 0.500 |
1 | C | 0.500 |
Now consider again the new data point . Use the Laplace-Smoothed version of your classifier to determine the joint probability of causes Y and this new data point, along with the posterior probability of Y given the new data:
Y | |
A | [q4.1] |
B | [q4.2] |
C | [q4.3] |
Y | |
A | [q4.4] |
B | [q4.5] |
C | [q4.6] |
What label does your (Laplace-Smoothed) classifier give to the new data point? (Break ties alphabetically). Enter a single capital letter.
[q4.7]
Consider a context-free grammar with the following rules (assume that S is the start symbol):
S → NP VP
NP → DT NN
NP → NP PP
PP → IN NP
VP → VB NP
DT → the
NN → man
NN → dog
NN → cat
NN → park
VB → saw
IN → in
IN → with
IN → under
How many parse trees are there under this grammar for the sentence: the man saw the dog in the park?
Following the previous question, How many parse trees for the sentence: the man saw the dog in the park with the cat?
Consider the following PCFG (probabilities for each rule are shown after the rule):
S → NP VP 1.0
PP → P NP 1.0
VP → V NP 0.6
VP → VP PP 0.4
P → with 0.8
P → in 0.2
V → saw 0.7
V → look 0.3
NP → NP PP 0.3
NP → Astronomers 0.12
NP → ears 0.18
NP → saw 0.02
NP → stars 0.18
NP → telescopes 0.2
What is the probability of the best parse tree for the sentence: Astronomers saw stars with ears?
Which of the following are true of convolutional neural networks (CNNs) for image analysis?
Lasso can be interpreted as least-squares linear regression where
Suppose we are given data comprising points of several different classes. Each class has a different probability distribution from which the sample points are drawn. We do not have the class labels. We use k-means clustering to try to guess the classes. Which of the following circumstances would undermine its effectiveness?